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Abstract — We consider a distributed storage problem in a 
large-scale wireless sensor network with n nodes among which 
k acquire (sense) independent data. The goal is to disseminate 
the acquired information throughout the network so that each of 
the 71 sensors stores one possibly coded packet and the original k 
data packets can be recovered later in a computationally simple 
way from any (1 + e)A: of nodes for some small e > 0. We 
propose two Raptor codes based distributed storage algorithms 
for solving this problem. In the first algorithm, all the sensors 
have the knowledge of n and k. In the second one, we assume 
that no sensor has such global information. 

I. Introduction 

We consider a distributed storage problem in a large-scale 
wireless sensor network with n nodes among which k sensor 
nodes acquire (sense) independent data. Since sensors are 
usually vulnerable due to limited energy and hostile environ- 
ment, it is desirable to disseminate the acquired information 
throughout the network so that each of the n sensors stores 
one possibly coded packet and the original k source packets 
can be recovered later in a computationally simple way from 
any (1 + e)k of nodes for some small e > 0. No sensor knows 
locations of any other sensors except for their own neighbors, 
and they do not maintain any routing information (e.g., routing 
tables or network topology). 

Algorithms that solve such problems using coding in a 
centralized way are well known and understood. In a sensor 
network, however, this is much more difficult, since we need 
to find a strategy to distribute the information from multiple 
sources throughout the network so that each sensor admits 
desired statistics of data. In [7], Lin et al. proposed an 
algorithm that uses random walks with traps to disseminate 
the source packets in a wireless sensor network. To achieve 
desired code degree distribution, they employed the Metropolis 
algorithm to specify transition probabilities of the random 
walks. While the proposed methods in [7] are promising, the 
knowledge of the total number of sensors n and sources k are 
required. Another type of global information, the maximum 
node degree (i.e., the maximum number of neighbors) of the 
graph, is also required to perform the Metropolis algorithm. 
Nevertheless, for a large-scale sensor network, these types 
of global information may not be easy to obtain by each 
individual sensor, especially when there is a possibility of 
change of topology. 



In [1], [2], we proposed Luby Transform (LT) codes based 
distributed storage algorithms for large-scale wireless sensor 
networks to overcome these difficulties. In this paper, we 
extend this work to Raptor codes and demonstrate their 
performance. Particularly, we propose two new decentralized 
algorithms. Raptor Code Distributed Storage (RCDS-I) and 
(RCDS-II), that distribute information sensed by k source 
nodes to n nodes for storage based on Raptor codes. In 
RCDS-I, each node has limited global information; while in 
RCDS-II, no global information is required. We compute the 
computational encoding and decoding complexity of these 
algorithms as well as evaluate their performance by simulation. 

II. Wireless Sensor Networks and Fountain Codes 
A. Network Model 

Suppose that the wireless sensor network consists of n 
nodes that are uniformly distributed at random in a region 
A = [L, L]^. Among these n nodes, there are k source 
nodes that have information to be disseminated throughout 
the network for storage. These k nodes are uniformly and 
independently chosen at random among the n nodes. Usually, 
the fraction of source nodes. 

We assume that no node has knowledge about the locations 
of other nodes and no routing table is maintained, and thus that 
the algorithm proposed in [4] cannot be applied. Moreover, 
besides the neighbor nodes, we assume that each node has 
limited or no knowledge of global information. The limited 
global information refers to the total number of nodes n, and 
the total number of sources k. Any further global information, 
for example, the maximal number of neighbors in the network, 
is not available. Hence, the algorithms proposed in [5]-[7] are 
not applicable. 

Definition 1: (Node Degree) Consider a graph G — {V, E), 
where V and E denote the set of nodes and links, respectively. 
Given u^v & V , we say u and v are adjacent (or u is adjacent 
to V, and vice versa) if there exists a link between u and v, 
i.e., (u, v) E E. In this case, we also say that u and v are 
neighbors. Denote by J\f{u) the set of neighbors of a node u. 
The number of neighbors of a node u is called the node degree 
of u, and denoted by dn{u), i.e., |A/'(u)| = dn{u). The mean 
degree of a graph G is then given by /x = -p^ SugG '^n{u). 
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Fig. I. The encoding operations of Fountain codes: eacli output is obtained 
by XORing d source blocks chosen uniformly and independently at random 
from k source inputs, where d is drawn according to a probability distribution 

n{d). 

B. Fountain Codes and Raptor Codes 

Definition 2: (Code Degree) For Fountain codes, the num- 
ber of source blocks used to generate an encoded output y 
is called the code degree of y, and denoted by dc{y)- The 
code degree distribution Vt{d) is the probabiUty distribution of 

dc{y)- 

For k source blocks 2:2, . . . , a;^} and a probability 
distribution Vl{d) with 1 < d < fc, a Fountain code with 
parameters (fc, Vt) is a potentially limitless stream of output 
blocks {yi, ?/2, •■•}■ Each output block is obtained by XORing 
d randomly and independently chosen source blocks, where d 
is drawn from a degree distribution Q,{d). This is illustrated 
in Fig. [T] 

Raptor codes are a class of Fountain codes with linear 
encoding and decoding complexity [10], [11]. The key idea 
of Raptor codes is to relax the condition that all input blocks 
need to be recovered. If an LT code needs to recover only a 
constant fraction of its input blocks, its decoding complexity 
is 0(fc), i.e., linear time decoding. Then, we can recover all 
input blocks by concatenating a traditional erasure correcting 
code with an LT code. This is called pre-coding in Raptor 
codes, and can be accomplished by a modern block code such 
as LDPC codes. This process is illustrated in Fig. |2] 

The pre-code Cm used in this paper is the randomized 
LDPC (Low-Density Parity-Check) code that is studied as 
one type of pre-code in [10]. In this randomized LDPC 
code, we have k source blocks and m pre-coding output 
blocks. Each source block chooses d pre-coding output blocks 
uniformly independently at random, where d is drawn from a 
distribution 17^(6?). Each pre-coding output blocks combines 
the "incoming" source blocks and obtain the encoded output. 

The code degree distribution Q,r{i) of Raptor codes for LT 
coding is a modification of the Ideal Soliton distribution and 
given by 
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where D = [4(1 + e)e] and p = (e/2) + (e/2)2. 

The following result provides the performance of the Raptor 
codes [10], [11]. 

Lemma 1 (Shokrollahi [10], [11]): Let i?o = (1 + 
e/2)/(l + e), and Cm be the family of codes of rate Rq. Then, 
the Raptor code with pre-code Cm and LT codes with degree 




Fig. 2. The encoding operations of raptor codes: k source blocks are first 
encoded to m pre-coding output blocks by LDPC coding, and then the final 
encoded output blocks are obtained by applying LT codes with these m pre- 
coding output blocks with degree distribution Qr{d). 

distribution fir (d) has a linear time encoding algorithm. With 
(l+e)fc encoded output blocks, the BP decoding algorithm has 
a linear time complexity. More precisely, the average number 
of operations to produce an output symbol is 0(log(l/e)), 
and the average number of operations to recover the k source 
symbols is 0(fc log(l/e)). 

III. Raptor Codes Based Distributed Storage 
(RCDS) Algorithms 

As shown in [1], [7], distributed LT codes are relatively 
simple to implement. Raptor codes take the advantage of 
LT codes to decode a major fraction of k source packets 
within linear complexity, and use another error correcting code 
to decode the remaining minor fraction also within linear 
complexity by concatenating such an error correcting code 
and LT code together [10]. 

Nevertheless, it is not trivial to achieve this encoding 
mechanism in a distributed manner. In this section, we propose 
two algorithms for distributed storage based on Raptor codes. 
The first is called RCDS-I, in which each node has knowledge 
of limited global information. The second is called RCDS-II, 
which is a fully distributed algorithm and does not require any 
global information. 

A. With Limited Global Information — RCDS-I 

In RCDS-I, we assume that each node in the network knows 
the value of k — the number of sources, and the value of n — 
the number of nodes. We use simple random walk [9] for each 
source to disseminate its information to the whole network. At 
each round, each node u that has packets to transmit chooses 
one node v among its neighbors uniformly independently at 
random, and sends the packet to the node v. In order to avoid 
local-cluster effect — each source packet is trapped most likely 
by its neighbor nodes — at each node, we make acceptance 
of any a source packet equiprobable. To achieve this, we also 
need each source packet to visit each node in the network at 
least once. 

Definition 3: (Cover Time) Given a graph G, let Tcover (u) 
be the expected length of a random walk that starts at node u 
and visits every node in G at least once. The cover time of G 
is defined by Tcover{G) = maxuea Tcover (u) [9]. 

Lemma 2 (Avin and Ercal [3]): Given a random geometric 
graph with n nodes, if it is a connected graph with high 
probability, then Tcover {G) = 6(nlogn). 
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Therefore, we can set a counter for each source packet and 
increase the counter by one after each forward transmission 
until the counter reaches some threshold Cinlogn to guaran- 
tee that the source packet visits each node in the network at 
least once. 

To perform the LDPC pre-coding mechanism for k sources 
in a distributed manner, we again use simple random walks to 
disseminate the source packets. Each source node generates b 
copies of its own source packet, where b follows distribution 
for randomized LDPC codes ^lL{d). After these b copies are 
sent out and distributed uniformly in the network, each node 
among m nodes chosen as pre-coding output nodes absorbs 
one copy of this source packet with some probability. In this 
way, we have m pre-coding output nodes, each of which 
contains a combined version of a random number of source 
packets. Then, the above method can be applied for these 
m pre-coding output nodes as new sources to do distributed 
Raptor encoding. In this way, we can achieve distributed 
storage packets based on Raptor codes. The RCDS-I algorithm 
is described in the following steps. 

(i) Initialization Phase: 

(1) Each node u in the network draws a random number 
dc{u) according to the distribution given by ([T||. 

(2) Each source node Si,i = 1, . . . , fc draws a random 
number b{si) according to the distribution of flL{d) 
and generates b{si) copies of its source packet Xs^ with 
its ID and a counter c(xs.) with initial value zero in 
the packet header and sends each of them to one of 
Si's neighbors chosen uniformly at random. 

(ii) Pre-coding Phase: 

(1) Each node of the remaining n — k non-source 
nodes chooses to serve as a redundant node with 
probability . We call these redundant nodes 
and the original source nodes as pre-coding out- 
put nodes. Each pre-coding output node Wj gen- 
erates a random number a{wj) according to dis- 
tribution ilc{d) given by rtc{d) = Pr(a(w) = 

d) - " where Eib] ^ 

(2) Each node that has packets in its forward queue 
before the current round sends the head of line packet 
to one of its neighbors chosen uniformly at random. 

(3) When a node u receives a packet x with counter 
c{x) < Cinlog(n) (Ci is a system parameter), the 
node u puts the packet into its forward queue and 
update the counter as c{x) = c{x) + 1. 

(4) Each pre-coding output node w accepts the first a{w) 
copies of different a{w) source packet with counters 
c{x) > Cin\og{n), and updates w's pre-coding result 
each time as ?/+ = y~ 0x. If a copy of Xg is accepted, 
the copy will not be forwarded any more, and w will 
not accept any other copy of Xg -- When the node w 
finishes a{w) updates, y^^ is the pre-coding output of 
w 

(iii) Raptor-coding Phase: 



(1) Each pre-coding output node Oj put its ID and a 
counter c{yo ) with initial value zero in the packet 
header, and sends out its pre-coding output packet yo 
to one of its neighbor u, chosen uniformly at random 
among all its neighbors Af{oj). 

(2) The node u accepts this pre-coding output packet j/o^ 
with probability and updates its storage as z+ = 
z~ ® yo - No matter the source packet is accepted or 
not, the node u puts it into its forward queue and set 
the counter of yo as c{yo ) = 1. 

(3) In each round, when a node u has at least one pre- 
coding output packet in its forward queue before the 
current round, u forwards the head of line packet y 
in its forward queue to one of its neighbor v, chosen 
uniformly at random among all its neighbors Miu). 

(4) Depending on how many times y has visited v, the 
node V makes its decisions: 

• If it is the first time that y visits u, then the node 
V accepts this source packet with probability 

and updates its storage as z+ = z~ y. 

• If y has visited v before and c{y) < Cinlogn, 
then the node v accepts this source packet with 
probability 0. 

• No matter y is accepted or not, the node v puts it 
into its forward queue and increases the counter of 
y by one c{y) — c{y) + 1. 

• If y has visited v before and c{y) > Cin log n then 
the node v discards packet y forever. 

(iv) Storage Phase: When a node u has made its decisions for 
all the pre-coding output packets j/o^ , yo^ , j/om^ i-S-. all 
these packets have visited the node u at least once, the 
node u finishes its encoding process and z„ is the storage 
packet of u. 

The RCDS-I algorithm achieves the same decoding perfor- 
mance as Raptor codes. Due to the space limitation, all the 
proofs for the theorems and lemmas are omitted. 

Theorem 3: Suppose sensor networks have n nodes and k 
sources, and let k/m = (1 + e/2)/(l + e). When n and k are 
sufficient large, the k original source packets can be recovered 
from (1 + e)k storage packets. The decoding complexity is 
0(fclog(l/e)). 

The price for the benefits we achieved in the RCDS-I 
algorithm is the extra transmissions. The total number of 
transmissions (the total number of steps of k random walks) 
is given in the following theorem. 

Theorem 4: Denote by T^^^^j^g the total number of trans- 
missions of the RCDS-I algorithm, then we have 

T^^CDS — Q{knlogn) + 6(mnlogn), (2) 

where fc is the total number of sources before pre-coding, m 
is the total number of outputs after pre-coding, and n is the 
total number of nodes in the network. 

B. With no Global Information — RCDS-II 

In RCDS-I algorithm, we assume that each node in the 
network knows n and fc — the total number of nodes and 
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sources. However, in many scenarios, especially, when changes 
of network topologies may occur due to node mobility or node 
failures, the exact value of n may not be available for all 
nodes. On the other hand, the number of sources k usually 
depends on the environment measurements, or some events, 
and thus the exact value of k may not be known by each 
node either As a result, to design a fully distributed storage 
algorithm which does not require any global information is 
very important and useful. In this subsection, we propose such 
an algorithm based on Raptor codes, called RCDS-II. The idea 
behind this algorithm is to utilize some features of simple 
random walks to do inference to obtain individual estimations 
of n and k for each node. 

To begin, we introduce the definition of inter-visit time and 
inter-packet time. For a random walk on any graph, the inter- 
visit time is defined as follows [8], [9]: 

Definition 4: (Inter- Visit Time) For a random walk on a 
graph, the inter-visit time of node u, Tmsitiu), is the amount 
of time between any two consecutive visits of the random walk 
to node u. This inter-visit time is also called return time. 

For a simple random walk on random geometric graphs, the 
following lemma provides results on the expected inter-visit 
time of any node. 

Lemma 5: For a node u with node degree dn{u) in a 
random geometric graph, the mean inter-visit return time is 
given by 

/in 



E[Tvisit{u)] 



dn{uY 



(3) 



where ji is the mean degree of the graph. 

From Lemma |5] we can see that if each node u can 
measure the expected inter-visit time E\Tfjisit{u)], then the 
total number of nodes n can be estimated by 



h\u) 



dn{u)E[T.visu{u)\ 



(4) 



However, the mean degree /i is a global information and may 
be hard to obtain. Thus, we make a further approximation and 
let the estimation of n by the node u be 



n{u) = E[Tyisu{u)]. 



(5) 



In our distributed storage algorithms, each source packet 
follows a simple random walk. Since there are k sources, we 
have k individual simple random walks in the network. For 
a particular random walk, the behavior of the return time is 
characterized by Lemma |5] Nevertheless, Lemma |6] provides 
results on the inter-visit time among all k random walks, which 
is called inter-packet time for our algorithm and defined as 
follows: 

Definition 5: (Inter-Packet Time) For k random walks on 
a graph, the inter-packet time of node u, Tpacket{u), is the 
amount of time between any two consecutive visits of those 
k random walks to node u. 

Lemma 6: For a node u with node degree dn{u) in a 
random geometric graph with k simple random walks, the 



mean inter-packet time is given by 

E[Ty.isit{u)\ 



E\T, 



packet 



(«)] 



/in 



k 



(6) 



kdn{u) 

where ji is the mean degree of the graph. 

From Lemma |5] and Lemma |6] it is easy to see that for any 
node u, an estimation of k can be obtained by 

E[Tyisit{u)\ 



k{u) 



(7) 



(«)]■ 

After obtaining estimations for both n and fc, we can employ 
similar techniques used in RCDS-I to perform Raptor coding 
and storage. We will only present details of the Interference 
Phase due to the space limitation. The Initialization Phase, 
Pre-coding Phase, Raptor-coding Phase and Storage Phase are 
the same as in RCDS-I with replacements of k by k{u) and 
n by h{u) everywhere. 

Inference Phase: 

(1) For each node u, suppose x^/^v is the first source packet 
that visits u, and denote by the time when a;s(„)j 
has its j-th visit to the node u. Meanwhile, each node 
u also maintains a record of visiting time for each other 
source packet x^j^). that visited it. Let ig(^^y be the time 
when source packet a;^(„). has its j-th visit to the node 
u. After Xs(.-a)i visiting the node u C2 times, where C2 is 
system parameter which is a positive constant, the node 
u stops this monitoring and recoding procedure. Denote 
by k{u) the number of source packets that have visited 
at least once upon that time. 

(2) For each node u, let J{s{u)i) be the number of 
visits of source packet Sgf^). to the node u and 



let T. 



s(u)i 



1 Y^J(«(").) _ fy 

J{s{u)i) 2^3 = 1 s{u), 



Aj) 



Let Jii' = min{J{s{u)i), J{s{u)i')}, and 

= t%), ~*iiu)^, ■ Then, the average 

inter- visit time and inter-packet time for node u are given 

fe(u) 



by Tyisit{u) 



feC 



packet 



iu) 



k(u){k\u)-i) ' Ts(uUs(u).' ,respectively. 

Then the node u can estimate the total number of 
nodes in the network and the total number of sources as 
f^^u) = ^""y(") ,and k{u) = J— 
(3) In this phase, the counter c{xs-) of each source packet 
c{xsi) is incremented by one after each transmission. 



IV. Performance Evaluation 

In this section, we study the performance of the proposed 
RCDS-I and RCDS-II algorithms for distributed storage in 
wireless sensor networks through simulation. The main per- 
formance metric we investigate is the successful decoding 
probability versus the decoding ratio. 

Definition 6: (Decoding Ratio) Decoding ratio ry is the ratio 
between the number of querying nodes h and the number of 
sources k, i.e., rj = j:. 

Definition 7: (Successful Decoding Probability) Successful 
decoding probability Pg is the probability that the k source 
packets are all recovered from the h querying nodes. 
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(a) (b) 
Fig. 3. Decoding performance of the RCDS-I algorithm: (a) small number 
of nodes and sources; (b) large number of nodes and sources 




(a) (b) 
Fig. 4. Decoding performance comparison of the RCDS-I and RCDS-II 
algorithms: (a) small number of nodes and sources; (b) large number of nodes 
and sources 




(a) (b) 
Fig. 5. Impact of system parameters: (a) decoding peiformance of RCDS-I 
algorithm with different C'l , (b) decoding performance of RCDS-I algorithm 
with different C2. 

In our simulation, P, is evaluated as follows. Suppose the 
network has n nodes and k sources, and we query h nodes. 
There are (^) ways to choose such h nodes, and we choose 
M — ('^) = io.;,!",t_^-)| uniformly randomly samples of the 
choices of query nodes. Let Ms be the number of samples of 
the choices of query nodes from which the k source packets 
can be recovered. Then, the successful decoding probability is 
evaluated as Ps = 

Our simulation results are shown in Figures. [3] |4] and |5] 
Fig. [3] shows the decoding performance of RCDS-I algorithm 
with different number of nodes and sources. The network is 
deployed in A — [5,5]^, and the system parameter Ci is 
set as Ci — 5. From the simulation results we can see that 
when the decoding ratio is above 2, the successful decoding 
probability is about 95%. Another observation is that when the 
total number of nodes increases but the ratio between k and n 
and the decoding ratio 7/ are kept as constants, the successful 
decoding probability Ps increase when rj > lA and decreases 



when 7] < 1.4. That is because the more nodes we have, 
the more likely each node has the desired degree distribution. 
Fig. |4] compares the decoding performance of RCDS-II and 
RCDS-I algorithms. To guarantee each node obtain accurate 
estimations of n and k, we set C2 = 50. It can be seen that 
the decoding performance of the RCDS-II algorithm is a little 
bit worse than the RCDS-I algorithm when decoding ratio rj 
is small, and almost the same when 77 is large. To investigate 
how the system parameter Ci and C2 affects the decoding 
performance of the RCDS-I and RCDS-II algorithms, we fix 
the decoding ratio 77 and vary Ci and C2. The simulation 
results are shown in Fig. |5] It can be seen that when Ci > 4, 
Ps keeps almost like a constant, which indicates that after 
471 log 71 steps, almost all source packet visit each node at 
least once. We can also see that when C2 is chosen to be 
small, the performance of the RCDS-II algorithm is very poor 
This is due to the inaccurate estimations of k and n of each 
node. When C2 is large, for example, when C2 > 40, the 
performance is almost the same. 

V. Conclusion 

In this paper, we studied Raptor codes based distributed 
storage algorithms for large-scale wireless sensor networks. 
We proposed two new decentralized algorithms RCDS-I and 
RCDS-II that distribute information sensed by k source nodes 
to n nodes for storage based on Raptor codes. In RCDS-I, 
each node has limited global information; while in RCDS-II, 
no global information is required. We computed the compu- 
tational encoding and decoding complexity, and transmission 
costs of these algorithms. We also evaluated their performance 
by simulation. 
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